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Abstract 

We extend the definition of functional data registration to encompass a larger class of 
registration models. In contrast to traditional registration models, we allow for registered 
functions that have more than one primary direction of variation. The proposed Bayesian 
hierarchical model simultaneously registers the observed functions and estimates the two 
primary factors that characterize variation in the registered functions. Each registered func¬ 
tion is assumed to be predominantly composed of a linear combination of these two primary 
factors, and the function-specific weights for each observation are estimated within the regis¬ 
tration model. We show how these estimated weights can easily be used to classify functions 
after registration using both simulated data and a juggling data set. 
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1 INTRODUCTION 

This paper extends current functional data registration methods to encompass a broader family of 
registration models. Traditional registration methods are designed to eliminate all phase variability 
in a set of functions so that amplitude variability in the registered functions can be described by one 
primary functional direction. A simple example can be found in Figure [TJ Here, each unregistered 
function, Xjt), in the center plot can be expressed as X t (t) = zu * f\(t + q) where /i(t) is the 
primary direction of variation in the registered functions. After registration, these functions only 
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exhibit amplitude variability in the direction of f\ (t) as seen in the illustration on the right. For 
a thorough discussion on the history of and current methods in functional registration, see Earls 
and Hooker [4]. 

We build on this traditional concept of functional registration, by considering unregistered 
functions for which eliminating phase variability in these functions results in registered functions 
that vary in two primary functional directions which we will denote fi{t) and / 2 (f). Allowing two 
primary functional directions of variation in the registered functions extends the use of functional 
registration to functional data sets such as that found in Figure [2j Here the composition of 
some of the registered functions includes a negative scaling of the second factor which confounds 
traditional approaches to registration. Considering these factors separately in the registration 
process is essential to eliminate phase variability in these functions. 

The registration model presented here is an extension of our previous work in traditional 
functional registration in the framework of a Bayesian hierarchical model, Earls and Hooker [4], 
In our previous work, we demonstrate this approach to functional registration not only allows for 
flexible modeling assumptions, but also results in estimates of registered functions that are similar 
to those determined by the best registration procedures currently available. Here we will extend 
this model to not only register functions with multiple directions of variation after registration, 
but also to perform factor analysis. For these models, approximate inference can be performed 
with an adapted variational Bayes algorithm that significantly reduces the computational time 
needed for initial estimates. Using these estimates are to initialize an MCMC sampling scheme 
eliminates the need for a burn-in period. Appendix B provides the details of this algorithm for the 
registration and factor analysis model. A complete discussion of the adapted variational Bayes 
(AVB) algorithm can be found in Earls and Hooker [4] where we also compare AVB estimates to 
those obtained by MCMC sampling for several data sets. 

There is no previous work that combines registration and factor analysis; however, in Kneip 
and Ramsay [Sj, the authors also consider registration where the aligned functions are assumed 
to contain variation in more than one functional direction. In their paper, Kneip and Ramsay 
register functions using an iterative algorithm that updates the PCA decomposition used to register 
functions in each iteration. This model can be seen as an extension of the Procrustes method for 
traditional functional registration, Ramsay and Li [US] and Ramsay and Silverman pTOj . Earls and 
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Figure 1: Example of Traditional Function Registration. Left The functional direction in which 
describes all variation in the registered functions, Center Each unregistered function is 

a scaling of the primary functional direction that is shifted horizontally. These horizontal shifts 
account for the phase variability in the data. Right The functions after registration. 

Hooker [4] demonstrates how our initial registration model improves upon the Procrustes method. 

The basic organization of this paper is as follows. We present our model for registration and 
factor analysis in Section [2] In Section [3] we compare our model for functional alignment to one 
of the best traditional registration methods using two simulated data sets. In this section, we 
also show how functions can be grouped according to their estimated weights on each of the two 
factors. In Section [4], we apply this model to a juggling data set. Finally, a discussion can be 
found in Section [5] 

2 FACTOR ANALYSIS MODEL FOR REGISTRATION 
AND GROUPING 

2.1 Informative Precision Matrices for Functional Data Registration 

We extend Earls and Hooker [1], on functional registration via Gaussian process models, to allow 
for more flexible assumptions in the structure of the registered functions. Using the classical defi¬ 
nition of functional registration, in Earls and Hooker [4], we propose a registration model designed 
to register functions that once registered have little variation from one functional direction. While 
appropriate for many statistical analyses, this registration model does not adequately register 
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Figure 2: Example of Expanded Function Registration. Left The two functional directions that 
describe all variation in the registered functions, fi(t) and / 2 (f). Center Each unregistered 
function is composed of a linear combination of f±(t) and $ 2 ( 1 ) with non-linear phase variation. 
Right The functions after registration. 

functions in which there are more than one primary direction of variation in the registered func¬ 
tions. As we will show in Section [3j other registration methods based on this traditional definition 
of registration also tend to perform poorly when the registered functions are composed of more 
than one primary direction of variation. 

Earls and Hooker [3] establishes that under the assumption that the registered functions vary 
insignificantly from one primary functional direction, the following data distribution is appropriate 
to register functions X*(t), i — 1,..., N. 

Xi(hi(t))\ zm, zu, f^t) ~ GP(z 0i + x;i i /i(t),7 1 “ 1 S(s,t)) s,teT ( 1 ) 

where Xi(hi(t )) is X t (t) registered under the warping function hi(t). The above covariance func¬ 
tion, 7b 1 E(s, t), penalizes all variance from a scaling and vertical shifting of the primary functional 
direction, f\ (t). In these models we will define 71 as a registration parameter that determines the 
severity of this penalty. This registration parameter is balanced by a penalty on the warping func¬ 
tions, — 1,..., N that penalizes distance from the identity warping. For more information 

on this model see Earls and Hooker [3], 

It is natural to extend this initial model to 

Xi{hi{t)) | z 0 i,z u J 1 (t),z 2 iif2(t) ~ GP(z 0i + z u fi{t) + z 2i f2(t),”/i 1 'E(s,t)) s,teT ( 2 ) 

However, this distribution penalizes variation from the first and second functional directions 
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(factors), fi(t) and / 2 (t), equally. For most data, variation in one of the factors will exceed varia¬ 
tion in the other factor. Accounting for this discrepancy in the statistical model for the registered 
functions not only provides a better registration, but also creates an identifiable relationship be¬ 
tween the two factors. We will thus proceed with the following distribution for the registered 
functions. 

Xi(hi(t )) | Zoi, Zn, fiit), Z2i, f?it) ~ GP{zoi + Zufift) H-y-— Z2if2(t), (71 + 72) 1 S(s, t)) S,teT 

71 + 72 

This distribution introduces separate penalties for 1) variation from the first functional direc¬ 
tion, f], controlled by registration parameter, 7 1; and 2) variation from a linear combination of f x 
and f 2 controlled by registration parameter 72 , 72 < 7i- 

Before establishing the basis for the exact specification of the distribution above, we note 
here, as is common with functional data, that each unregistered function, X t (t) 1 is assumed to 
be observed over a finite number of equally spaced time points, t = (£ 1; ... ,t p )'. Thus, given the 
above model, in practice we will proceed by using finite approximations to each function. In Earls 
and Hooker [ 3 j we establish some theoretical properties of these types of approximations. The 
following finite-dimensional distribution is used in the final model in lieu of its infinite dimensional 
counterpart above. For Xj(hj) = (7Q(/ij(H)), ..., X^hiitp))) 1 , i = 1,..., N, 

72 1 

Mhi) | z 0i , Zu, fi, z 2i , f 2 ~ N p (zoil + Zuii H--- Z2&, (71 + 72) S) ( 3 ) 

7i + 72 

The underlying principle for both the registration and factor analysis model presented here 
and the basic registration model described in Earls and Hooker [5J is the use of informative priors 
in a Bayesian hierarchical model. The mean vectors and precision matrices used in the prior 
distributions of the registered functions, ([!]) and (|3]), for these models are selected to define the 
types of variation allowable for functions that are fully registered. Explicitly defining proper 
covariance relationships for registered functions in these prior distributions results in posterior 
estimates of the registered functions that are registered by warping the time domain of each 
unregistered function until the covariance relationships in the resulting registered functions are 
optimal according to this prior information. 

For the registration and factor analysis model, we would like to use separate precision matrices 
in the prior on the registered functions to penalize registered function estimates for variation in 
directions other than 1) a scaling of the first factor, fi, and 2) a linear combination of the first 
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and second factors, fi and f 2 . Thus, we again utilize the precision matrix, E _1 , that is designed 
to penalize all variation from a given mean function and require the prior for the approximated 
registered functions, Xj(hj) to have the following property, 

Xj(hj) | Z 0 i,Z 1 i,f 1 ,Z 2 i,f 2 oc 

exp [ - ^((Xj(hj) - (z 0i lp + ^i ; :fi))' 7 i s_ 1 ( x i( h i) - (z Qi l p + £ H fi)))] * (4) 

exp { - ^(Xj((hj) - (z 0i l p + zufi + ^f 2 )) , 72 S- 1 (X l (h J ) - (z 0i l p + z u fi + ^ 2 ^ 2 )))] (5) 

where 71 > 72 so that variation in the registered functions in directions other than a scaling 
of the first factor Q is penalized more heavily than variation in directions other than a linear 
combination of both factors (J5]) (where both penalties account for vertical shifts). A specific 
definition of E can be found in Appendix A.l. 

After rearranging terms and determining the appropriate normalizing constant, this criterion 
results in prior distribution ([3]) for the registered functions. 
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2.2 Model Specifications 

The full data and prior distributions for the registration and factor analysis model assuming 
unregistered functions have been observed over t = (4 • • • t p )' are 

Xj(hj) | Zoi, Zn, fi, Z 2 i, {2 ~ N p (zoil + Z\ii\ -4 - - -£2^2,(71 + 72) X S) i — (6) 

7i + 72 

S = P 1 + P 2 (7) 

i 

h i(tj) = 4 + ^2(t k - t k _ 1 )e Wi( - tk - l) i = l,...,N j = l,...,p 


w i (x N p _ i(0, 7 ,/S + A U) 1 P w )l{ti + ^(4 - 4_i)e Wi(tfc - l} = t p } (8) 

k =2 




i = 1,- • • ,-ZV 



Pm 

= 

P 2 


(9) 
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Note: For simplicity, here we include only the finite dimensional representation of all functional 
model specifications. Keep in mind that all multivariate normal distributions above are derived 
from a Gaussian process distribution evaluated over (4,... ,t p )'. Throughout this paper we will 
refer to the functions and finite representations of the functions interchangeably. 

In this model, a, b, c, and d are hyper-parameters defining uninformative priors on the vari¬ 
ance components and smoothing parameters. The parameters, zoi, i = 1,... N, allow the registered 
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functions to vary by vertical shifts from a linear combinations of the two factors, fi(t) and / 2 (f). 
The constraint, zqn = — z ou ensures the average vertical shift is estimated to be 0. The 

parameters, Zu and Z 2 i,i — 1,... N , are the function specific weights for f\ (t) and ./d(^), respec¬ 
tively. 

In this paper, we will refer to the functions, Wi(t), t E T, from which the warping functions, 
hi(t), t gT, are derived, as the base functions. The base functions are non-parametrically specihed 
for optimal registration. We, however, impose the following restrictions on the warping functions: 

1. h(t i) = H 

2. h{t p ) = t p 

3. if tk > tj, then h{t k ) > h(tj ) for all t k ,tj E T 

Restrictions 1 and 3 are built into the definition of hi(t). Restriction 2 is imposed through 
the indicator function in the expression for the prior defined for each base function, Wi(t), (J8|. 
Furthermore, note that Wi(t) = 0 corresponds to the identity warping, h t (t) = t. The penalty 
matrix 5R 1 is utilized again in the prior for the base functions to penalize variation from the 
identity warping, with corresponding registration parameter ^ w . This penalty is necessary to 
avoid losing important features in each function due to extreme differences between registered and 
observed time. Additionally, P w is a matrix designed to penalize the second squared derivative of 
the base functions with corresponding parameter A^. This penalty is not always necessary but is 
included to allow for additional flexibility in penalizing significant departures from the identity in 
the warping functions. Here we will elaborate on not only this covariance specification, but the 
covariance specifications for all functional parameters. 

In the above model specifications, all covariance matrices are the evaluation over a finite grid of 
time points of a covariance function composed of a linear combinations of two bi-variate functions, 
Pi(s,t) and P 2 (s,t). P\ (s, t) penalizes variation in constant and linear functions and P 2 (s, t.) 
penalizes function variability in all other directions. Together they define a proper covariance 
function. For each covariance matrix above, the specification of the registration and smoothing 
parameters indicate the extent the two different types of variability should be penalized for each 
function. For example, for both the registered functions and the base functions, we want to penalize 
variation in any direction other than that of the mean function. The covariance specifications of 
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(71 + 72 ) and 7 W X S reflect these penalties, where the magnitude of the penalty is controlled by 
registration parameters, 71 , 72 , and 7 ^, (distributional assumptions [6|7| an We can use P 2 (s, t ) 

to penalize roughness in a given function. Here we would like the factors, fi(t) and / 2 (f), and the 
base functions to be smooth. This is achieved by the inclusion of A J 1 P 2 (s,t) and \~ 1 P 2 (s,t) in 
the priors for these functions (distributional assumptions [lOj [TTJ [l 2 j [IJ and [9| where the level of 
the penalty is controlled by the smoothing parameters A/ and X w . The inclusion of rjJ 1 Pi(s,t) in 
the covariance specification for f\ (t) and / 2 (f) is needed to define a proper covariance function for 
these distributions where ijf only needs to be large enough to ensure stability in the model. Note, 
7 / and A/ are considered as additional unknown parameters to be estimated through the model. 
For the exact definitions of Pi(s,t) and P 2 (s,t), see Earls and Hooker [3]. 

Short runs of the adapted variational Bayes algorithm introduced in Earls and Hooker [3] can 
be used to establish optimal registration parameters in this model. General guidelines include 
setting 72 < 71 , where 71 is at least a factor of 10 larger than 72 . 

In addition to allowing more flexibility in the shape of the registered functions, a bi-product of 
this analysis is the estimation of the two functional directions, fi(t) and / 2 (f), and the associated 
weights of these two factors for each function, zu and Z 21 , i = 1,..., N, respectively. These factors 
tend to have a more interpretable shape than principal components, and estimating the weights for 
each function provides a way to group registered functions. Examples are found in both Section 
[3] and Section [4] . 

As is typical with hierarchical models, all parameters can be estimated using MCMC samples 
from the joint posterior distribution. However, obtaining these samples in high-dimensional models 
can be expensive and time-consuming. In Earls and Hooker [@J we define and establish convergence 
properties for an adapted version of variational Bayes that can also be utilized here. Appendices 
A and B contain all of the model specifications, full-conditionals for a MCMC sampler, and details 
of the adapted variational Bayes algorithm. 

We note here that for many analyses it is desirable for registered time, t, to correspond to 
the average of the estimated warping functions over the sample. In other words for all 1 6 T, 
h.(t) = t. While this model does not inherently impose this restriction, it is straightforward to 
shift all estimated functions, post-registration, so that this requirement is satisfied. For details on 
how to perform this final adjustment see Earls and Hooker |3]j. 
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Figure 3: First Simulated Data Set. Top Left Original unregistered functions. Top Right 
Functions registered by F-R (R package ’fdasrvf’). Lower Left Functions registered by the FA 
model. Lower Right Estimated factors fi and f2. 
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Figure 4: Four groups determined by the centered weights, z 1 and z 2 . Top Left {X^h^t)) : Zu > 
0 ,z 2i > 0}. Top Right {Xi(hi(t)) : z u < 0 ,z 2i < 0} Lower Left {X^h^t)) : z u < 0 ,z 2i > 0} 
Lower Right {Xi(hi(t)) : z u > 0 ,z 2 i < 0} 
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3 COMPARISON TO CURRENT METHODS 


One of the best registration models currently available is that proposed by Srivistava, et.al. H7j. 
In their work, the authors build a registration model based on the Fisher-Rao Riemannian metric 
that is superior to many previously considered algorithms (F-R method). 

In Earls and Hooker [4], we obtain registration results similar to the F-R method using a 
Gaussian process model (GP). The extension of this model proposed in this paper improves on 
the F-R method for certain types of data. Here, we compare the registration results of F-R and 
of our GP model using two simulated data sets. 

The Sobolev Least Squares (sis) criterion is used to compare the functions registered using the 
GP model to those registered by F-R. This criterion compares the total cross-sectional variance 
of the first derivatives of the registered functions to that of the original functions. Explicitly, 

sis = -T7---G- 13 

In Srivistava, et.al. HZj, sis is seen as the best measure of alignment in comparison to two 
other criterion, a least squares criterion and a pairwise correlation criterion. Lower values of sis 
correspond to better function alignment. 

First Simulated Data Set The 21 unregistered functions are simulated using the algorithm 
originally proposed by Kneip and Ramsay [8] where the authors also consider registration in 
the context of multiple directions of functional variation. The registered functions X^h^t)), 
i = 1,... ,21, are defined as Xi(hi(t )) = cije - ' 5 ^ -1 ' 5 -* 2 + C 2 ,e~' 5( ' <+1 ' 5 ) 2 , t e [—3,3] where c\ t and C 2 i 
are iid N(l, .25 2 ). These functions are then warped so that /q(f) = —) — 3 if a* ^ 0, 

where a il i = 1,..., 21 are equally spaced between -1 and 1. If a* = 0, hi(t) = t. 

Data simulated in the same way are also registered using the F-R method in Srivistava, et.al. 
id- Here we again use their method to register the simulated unregistered functions for com¬ 
parison purposes. In Figure [3] are plots of the simulated unregistered functions and the functions 
registered using both the F-R algorithm and the proposed GP model. Both methods achieve a 
high degree of alignment with the GP model performing slightly better in respect to the sis crite¬ 
rion. The lower left frame of Figure [3] contains the two estimated factors to which these data are 
registered. 
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While the GP model performs similarly to F-R in this example, the added benefit of using the 
GP model is that the registered functions can be grouped according to their associated weights, 
zi and z 2 on each of the factors, i\ and f 2 . For functions that require registration to adequately 
describe the variability in the prominent features of the functions, attempting to classify functions 
with similar characteristics before registration will result in misclassihcations that are a byproduct 
of the principal components or factors of the unregistered functions not reflecting the important 
differences that exist in these functions. In our model, variability in the weights, Z\ and z 2 reflect 
variability in significant features of the original functions without phase distortion. The result 
is that functions with similar weights reflect functions with similar features. In Figure [4[ the 
registered functions are grouped by the estimated centered weights zi and z 2 ; all functions whose 
centered weights lie in the same quadrant are grouped together. 

Second Simulated Data Set Here we consider data with features that are not aligned 
well using traditional definitions of registration. Each of the 20 simulated registered functions is 
composed of a linear combination of two factors which is then subjected to a random warping 
to obtain a simulated unregistered function. The factors, fi and f 2 , from which these data are 
simulated are found in Figure [5j 

The alignment of these functions using the GP model is again compared to that obtained by 
F-R. For this example, the quality of alignment is best assessed by using the Sobolev Least Squares 
criterion separately for each of two groups of functions. Group 1 consists of functions for which 
Z 2 i > 0. The second group is characterized by functions for which < 0. The final sis value is 
the sum of the sis values for the two groups. 

In Figure [5] are plots of the simulated unregistered functions, the functions registered by F-R, 
and the functions registered by GP. Not only is the sis value lower for the GP model, visually 
it is apparent that functions registered by the GP model are better aligned. In this example the 
estimated factors closely resemble the original factors from which the data are simulated. These 
can be seen in Figure [6] Also, in Figure [6] are three groups of registered functions determined only 
by classifying the estimated weights on the second factor. 
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Figure 5: Second Simulated Data Set. Top Left The two factors used to simulate data before 
warping. Top Right Simulated unregistered functions. Lower Left Functions registered by F-R 
(R package ’fdasrvf’). Lower Right Functions registered by the GP model. 
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Figure 6: Three groups determined by the estimated weights on the second factor, z 2 . Top Left 
{Xi(hi(t)) : z 2i G [-.1, .1]}. Top Right {X^h^t)) : z 2i < -.1} Lower Left {X ; (hj(t)) : z 2i > .1} 
Lower Right Estimated factors, fi and f 2 , determined by the GP model. 


15 







4 THE JUGGLING DATA: REGISTRATION AND GROUP¬ 


ING 

The juggling data consist of three different functional data sets obtained by recording the finger 
position of Dr. Michael Newton (Biostatistics, University of Wisconsin) as he juggles. These 
data were collected in collaboration with Dr. James Ramsay (Psychology, McGill University), Dr. 
David Ostry (Psychology, McGill University), and Dr. Paul Gribble (Psychology, University of 
Western Ontario) and can be downloaded from http://mbi.osu.edu/programs/current_topic_ 
workshop/ under the link for the 2012 workshop, CTW: Statistics of Time Warpings and Phase 
Variations. As Dr. Newton juggled, the following were recorded: 1) the horizontal position of 
the right forefinger in the frontal plane, 2 )the horizontal position of the right forefinger in the 
sagittal plane, and 3) the vertical position of the right forefinger. For this data analysis, the 
first functional data set of the horizontal position of the right forefinger in the frontal plane is 
used to demonstrate functional data registration and grouping using our Gaussian process model. 
Additional information on this data set can be found in Ramsay and Silverman m- 

Description of the Juggling Data For this analysis, our observations consist of individual 
cycles. In each cycle, the right hand cycles smoothly oscillates from left to right as the ball is 
caught and released. Each functional observation begins at the apex of each cycle that corresponds 
to the X-coordinate of the juggler’s right forefinger immediately after releasing the ball with his 
right hand. From here, each function takes a sharp dip as the juggler’s hand moves to the left to 
catch the next ball. Variation in the X-coordinate of these cycles correspond to the adjustments 
made by the juggler after the initial movement to the left to account for differences between 
where the ball actually descends and where the juggler anticipates it to be. Of approximately 100 
cycles available, we randomly selected 25 to use for this analysis. All cycles are considered over 
a common time domain ranging from 0 to 675 milliseconds where the original data are recorded 
in 5 millisecond intervals. Thinning the data does not significantly alter its shape, and the final 
data contains 28 records per functional observation (cycle) taken every 25 milliseconds. 

The goal of this analysis is two-fold. The first aim is to align the prominent features in these 25 
cycles in conjunction with estimating the two primary factors of which these data are composed. 
Secondly, using the estimated weights, z x and z 2 , classify these functions into groups of functions 
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Figure 7: Juggling Data. Top Left Original unregistered functions. Top Right Functions 
registered by F-R (R package ’fdasrvf’). Lower Left Functions registered by the FA model. 
Lower Right Estimated factors, fi and fj, determined by the GP model. 
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Figure 8: Four groups determined by the centered weights on the first factor, z 1 , and the un¬ 
adjusted weights on the second factor, z 2 . Top Left {X^h^t)) : z H > 0,5 2 * > 0}. Top 
Right {Xi(hi(t)) : Zu > 0, z-n < 0} Lower Left {Xi(hi(t)) : Zu < 0, i 2) ; > 0} Lower Right 
{Xi(hi(t)) : z u < 0 ,z 2i < 0} 
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that share similar features. 

Figure [7] contains plots of the unregistered functions, the functions registered by F-R, the 
functions registered by GP and the first two estimated primary directions of variation in these 
functions. Here again, based on the sis criterion, the GP model provides a better function align¬ 
ment than F-R. 

The estimated registered functions are split into four groups based on the estimated weights 
for each function in each of the primary directions of variation. Since all estimated weights on the 
first factor were positive, we centered these weights to delineate functions with large weights on 
the first factor and those with smaller weights on the first factor. In contrast, the variation in the 
estimated weights on the second factor could be described by whether this weight was positive 
or negative. Four groups were determined by the magnitude of the estimated weight on the first 
factor and the sign of the estimated weight on the second factor. This is equivalent to grouping 
by the quadrant in which the centered weight on the first factor and the unadjusted weight on the 
second factor, (z Uj z 2 i), lays. 

Figure [8] contains the resulting four groups of functions. In cycles with large weights on the 
first factor, found in the top two plots, the peak found in these functions between 200 and 300 
milliseconds corresponds to the juggler overcompensating for moving his hand too far to the left 
to catch the ball by making a sharp movement to the right. This is then followed by another 
adjustment to the left. A positive weight on the second factor in the first group corresponds to 
cycles where the juggler needs to make another significant adjustment in his hand position between 
300 and 500 milliseconds. This not only results in another significant peak in these functions, but 
also corresponds to the juggler releasing the ball with his hand further to the left when compared 
to the previous cycle. This can be seen in these functions having a smaller X-coordinate at the 
end of the cycle than at the beginning of the cycle. Differences in the X-coordinate when the ball 
is released between the previous and current cycle are much less prominent in Group 2. Groups 3 
and 4 contain cycles with smaller estimated weights on the first factor. This is reflected in more 
subtle adjustments in hand position in the horizontal plane between 200 and 300 milliseconds 
where these functions stay relatively flat. Again, as seen in Groups 1 and 2, here we see that the 
sign of the weight on the second factor delineates between cycles where there is a distinct change 
in the X-coordinate at the time the ball is released between the previous and current cycle and 


19 


when there was not. Group 3 corresponds to cycles where the ball is released further to the left in 
comparison to the release point for the previous cycle while those in Group 4 contain cycles with 
similar release points to that of the previous cycle. 

This example illustrates how our model for registration and factor analysis uncovers functional 
differences and similarities that cannot be detected as effectively using traditional methods for 
registering functional data. Furthermore, since the weights on each factor for each function are 
additional unknowns in our model, we can quantify how certain we are a function belongs to a 
particular group by looking at the variability in the posterior sample of these weights. 

5 DISCUSSION 

In this paper, we have proposed a Bayesian hierarchical model for functional registration and 
factor analysis. This model reduces phase variability in functions that when registered have 
significant variation in more than one primary functional direction. We have shown for these 
types of functions, our registration model outperforms one of the best registration algorithms 
available. Furthermore, in addition to performing functional registration, with our model two 
primary directions of variation are estimated for the registered functions. Each registered function 
is primarily composed of a weighted combination of these factors, and by classifying the estimated 
weights on these factors, functions can easily be grouped. 

For this analysis, a Metropolis within Gibbs sampler is used to obtain MCMC samples from 
the joint posterior distribution of all parameters. In general, MCMC sampling is inefficient for 
high-dimensional models. However, in this particular model, reasonable estimates can be obtained 
by utilizing an adapted form of variational Bayes that significantly reduces computational costs. If 
MCMC sampling is preferred, more efficient sampling schemes are available for use. In particular, 
Calderhead [2] suggests that population MCMC can be employed to allow both global and local 
movement throughout the parameter space for a more efficient sampler and could be applied here. 
Initial estimates for an MCMC sampler should be obtained using AVB for optimal performance. 

This work was possible through the flexibility of prior assumptions in a Bayesian hierarchical 
model. We have shown for functional data these models can encompass a multitude of inferential 
procedures including latent function estimation, functional linear regression, functional registra¬ 
tion, and functional registration with factor analysis ( Earls and Hooker [3] and Earls and Hooker 
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0 )- This list however is not exhaustive, and, in general, combinations of these inferential proce¬ 
dures can be encompassed within a single model. For instance, in Earls and Hooker [1] we propose 
a model for registering latent functions. Another example might be to encompass functional linear 
regression and registration within the same model. The advantage to these types of models are 
that common pre-processing steps such as smoothing, registration, and covariance estimation can 
be included within the model so that uncertainty in these steps can be encompassed in the final 
inferential procedure. Future work will focus on exploring further extensions to these models and 
continuing to pursue greater computational efficiency for these models. 
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APPENDIX A 

Below, in detail, are the specifications for the hierarchical Bayesian registration and factor 
analysis model discussed in this paper. The first section includes the basic model for functional 
data registration and factor analysis also found in Section [2] Section A. 2 describes the MCMC 
sampling scheme for this model. 

A.l Factor Analysis 

As discussed in Section [2j the initial assumption of this model is that we are interested in 
registering and possibly grouping functional data, X The registered functions, 
Xi(hi(t))i i = 1... N, are assumed to be characterized almost completely by a linear combination 
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of two factors, j\ (t) and / 2 (f). Below are the data and prior distributions used for this model. 
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S is a hxed matrix designed to penalize variation in any direction from the corresponding 
mean of the distribution in which it is utilized. It is composed of two matrices. Pi and P 2 , such 
that X = P x + P 2 . Pi penalizes variation from the mean in constant and linear directions, and 
P 2 penalizes variation from the mean in directions of curvature. 

P 2 is also used to penalize curvature in the base functions and factors, f\(t) and / 2 (t), with 
associated smoothing parameters \ w and A/. Further details of the construction of P x and P 2 are 
found in Earls and Hooker [3]. 

A.2 MCMC Sampling 

Using these assumptions, the following full conditional distributions are derived to run a 
MCMC sampler. Note, this list will not include a full conditional for the base functions or 
registered functions as their priors are not conjugate. Instead, the base and registered functions 
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are sampled via a Metropolis step. 
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APPENDIX B 


B.l Adapted Variational Bayes 

The variational Bayes procedure described here is based on the variational methods proposed 
by Ornerod and Wand [12] and Bishop [1]. Their proposed method optimizes a lower bound of the 
marginal likelihood which results in finding an approximate joint posterior density that has the 
smallest Kullback-Leibler (KL) distance, Kullback and Leibler [9j, from the true joint posterior 
density. 

In minimizing the KL distance between the approximate and true posterior distribution, pa¬ 
rameters are updated by an optimization method that requires an approximate posterior density 
that not only factors but factors into components of known parametric forms. Suppose, q(0) 
is the approximated posterior joint distribution. Then for some partition of 6 = {9±,... ,9^}, 
<?(0) = nti Qk(Ok), where each distribution q k is of a known parametric form. 

In our model, the Gaussian process priors for the base functions, w t (t), i = 1,... ,N, are not 
conditionally conjugate to the likelihood function. Therefore, the traditional variational Bayes 
optimization method does not apply directly since ?fc(wj), % — 1,..., N are not known parametric 
distributions. Thus, we propose an adapted variational Bayes algorithm. 

After initializing all parameters, in each iteration, the adapted variational Bayes algorithm 
performs two steps. In the first step, the ‘likelihood’ as a function of the base functions is max¬ 
imized. For this ‘likelihood’, all other parameters are fixed at their current values. The second 
step uses a traditional variational Bayes iterative scheme to update all other parameters. Specif¬ 
ically, assuming 6 k = w*,, for k — 1... N, so that, 9 = (wi,..., 9m+i, ■ ■ ■, 9 ^}, the adapted 

variational Bayes algorithm is as follows: 

1. Initialize 9 

2. For each iteration, m, and each k, k = 1,..., N, update the estimate for 
w k so that w[. m) = snp Wk q k (w k \ 9 ( f l ~ 1 \j = (N + 1),..., d) 

3. For each iteration, m, and each k, k = (iV+1),..., d, update q k so that q^ n> oc explE^ ^log f(9 k 
rest)], where the expectation is taken with respect to the distributions q " 1 X) [9j), j = 

1 ,...,d,j^k 
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4. Repeat steps (2) and (3) until the desired convergence criterion is met 


This algorithm is guaranteed to converge. However, convergence is not guaranteed to a global 
maximum, and in practice it is sometimes necessary to adjust the registration and warping penal¬ 
ties as the functions become registered. An unregistered function that requires a substantial 
amount of warping can cause convergence to a local maximum due to the small penalty on warp¬ 
ing. The flexibility in warping allowed by this small penalty can cause the function to deform 
rather than register. This can be remedied in two ways. The first option might be to perform 
a simple initial warping for this function that prevents the optimization from falling into a local 
mode. The second option is to adjust the registration and warping parameters over time. Initially 
a stronger warping penalty is employed to prevent function deformation. Then, as the functions 
register, the warping penalty can be reduced to allow for a more complete registration. When 
initializing an MCMC sampler, the final penalties on warping and registration from the adapted 
variational Bayes algorithm should be used. For further information on the convergence properties 
of the adapted variational Bayes algorithm and an analysis of how well adapted variational Bayes 
estimates correspond to MCMC estimates, see Earls and Hooker [Tj. 

Below are the approximate posterior distributions, qk{Ok), k — (N + 1),... ,d, for the adapted 
variational Bayes estimation procedure for the registration and factor analysis model. Note, the 
subscripts on the q distributions has been omitted. For a more thorough discussion and illustration 
of how the optimal q distributions are derived see Goldsmith et. al. [6]. 


26 


?( f l) 

rs-/ 

^p(l J/ q( fi)j ^?(fi)) 

<?(f 2 ) 

rs-/ 

j/ ' v p(/ X g(f 2 )) ^<?(f2)) 

q(zoi) 

r^> 

N{^q(z 0 i)i a q(z 0i )) 

q(°l 0 ) 


IG( a q(cr% o ),bq( cT 2j) 

q(zu) 

r 

N{Hq( Zl i)i <J q(z li )) 

<?«) 


IG( a q ^2^,bq^2^) 

q{*2 i) 


N{^q(z 2 i)l a q(z 2 i)) 



IG( a q (cr2 2 ),bq( a | 2 )) 

q(vf) 

r\^ 

G{Cq(r]f)i dq(rif)) 

?(*/) 

r^j 

G(c q (\f)i dq^Xj.')) 


The approximate joint posterior distribution of all parameters except the base functions is 
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As the q densities are all of known distributional forms, updating these densities is equivalent 
to updating their parameters. For each iteration, the following parameters are updated for the q 


densities found in (14). These updates arc listed in an order that allows the convergence criterion 
to be calculated. Further details on the convergence criterion can be found in Appendix B.2. 
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B.2 Convergence Criterion 

The adapted variational Bayes algorithm is run until changes in 
^(9-w) [% /(X, W, 0_ w ) - % 9 (0_ w )] are below a certain threshhold. This value can be com¬ 
puted in each iteration as follows. 
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Now looking at each piece individually, 
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where C is a constant that does not change from one iteration to the next. Similarly, 
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Eq( 0 - w) [log /(A/) - log g(A/)] = E q{0 _ m) log —- + (c - 1 )log X f - d\ f 
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The expression for E q ( 0 _^ [log /(X, w,£G w ) — log g(0_ w )] can be simplified much further by 
combining terms that cancel out. However, in some cases the ability to cancel terms depends on 
the order of the updates. For instance, in the expression, E^o ^ [log /(<v? 0 ) — log g(<r? 0 )], the terms 

~ b l l q( ) alld b q(°%)Pq(^) CallCcl with “1%,^ ( ( a q(z 0 i) + Pq(z oi ) )) fr ° m ^(0-w) /( Z o)- 

log g(z 0 )] as long as the parameters of q( z 0 ) are updated before b q (a? o )- For convenience, we have 
taken account the ordering necessary to compute the convergence criterion in the updates given 
above. Additionally, note all components in this expression that do not change from one iteration 
to the next can be ignored. 
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